AITopics | fault injection

Collaborating Authors

fault injection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fault injection analysis of Real NVP normalising flow model for satellite anomaly detection

Greco, Gabriele, Cena, Carlo, Albertin, Umberto, Martini, Mauro, Chiaberge, Marcello

arXiv.org Artificial IntelligenceNov-18-2025

Satellites are used for a multitude of applications, including communications, Earth observation, and space science. Neural networks and deep learning-based approaches now represent the state-of-the-art to enhance the performance and efficiency of these tasks. Given that satellites are susceptible to various faults, one critical application of Artificial Intelligence (AI) is fault detection. However, despite the advantages of neural networks, these systems are vulnerable to radiation errors, which can significantly impact their reliability. Ensuring the dependability of these solutions requires extensive testing and validation, particularly using fault injection methods. This study analyses a physics-informed (PI) real-valued non-volume preserving (Real NVP) normalizing flow model for fault detection in space systems, with a focus on resilience to Single-Event Upsets (SEUs). We present a customized fault injection framework in TensorFlow to assess neural network resilience. Fault injections are applied through two primary methods: Layer State injection, targeting internal network components such as weights and biases, and Layer Output injection, which modifies layer outputs across various activations. Fault types include zeros, random values, and bit-flip operations, applied at varying levels and across different network layers. Our findings reveal several critical insights, such as the significance of bit-flip errors in critical bits, that can lead to substantial performance degradation or even system failure. With this work, we aim to exhaustively study the resilience of Real NVP models against errors due to radiation, providing a means to guide the implementation of fault tolerance measures.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IJCNN64981.2025.11227924

2504.02015

Country: North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost

Kikuta, Daisuke, Ikeuchi, Hiroki, Tajiri, Kengo

arXiv.org Artificial IntelligenceNov-12-2025

Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems. It involves intentionally injecting faults into a system to test its resilience, uncover weaknesses, and address them before they cause failures in production. Recent CE tools automate the execution of predefined CE experiments. However, planning such experiments and improving the system based on the experimental results still remain manual. These processes are labor-intensive and require multi-domain expertise. To address these challenges and enable anyone to build resilient systems at low cost, this paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs). It predefines an agentic workflow according to a systematic CE cycle and assigns subdivided processes within the workflow to LLMs. ChaosEater targets CE for software systems built on Kubernetes. Therefore, the LLMs in ChaosEater complete CE cycles through software engineering tasks, including requirement definition, code generation, testing, and debugging. We evaluate ChaosEater through case studies on small- and large-scale Kubernetes systems. The results demonstrate that it consistently completes reasonable CE cycles with significantly low time and monetary costs. Its cycles are also qualitatively validated by human engineers and LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.07865

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Services (0.69)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research

Howard, Luke

arXiv.org Artificial IntelligenceSep-16-2025

Transformers have become the foundation for a wide range of state--of--the--art models across natural language processing, computer vision, and other machine learning domains. Despite their widespread deployment, the robustness of these models under fault conditions remains underexplored. We present GoldenTransformer, a modular and extensible fault injection framework designed to evaluate the resiliency of Large Language Models to induced hardware faults. GoldenTransformer offers a unified Python-based platform for injecting diverse classes of faults--such as weight corruption, activation injections, and attention--level disruptions--into pretrained transformer--based models. Inspired by the GoldenEye simulator for DNNs, our framework focuses on the unique challenges of working with large transformer architectures, including considerations such as structural complexity, latent dependencies, and nonuniform layer definitions. GoldenTransformer is built atop PyTorch and HuggingFace Transformers, and it supports experiment reproducibility, metric logging, and visualization out of the box. We detail the technical design and use of GoldenTransformer and demonstrate through several example experiments on classification and generation tasks. By enabling controlled injection of faults at multiple logical and structural points in a transformer, GoldenTransformer offers researchers and practitioners a valuable tool for model robustness analysis and for guiding dependable system design in real-world LLM applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.1079

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI Safety Assurance in Electric Vehicles: A Case Study on AI-Driven SOC Estimation

Skoglund, Martin, Warg, Fredrik, Mirzai, Aria, Thorsen, Anders, Lundgren, Karl, Folkesson, Peter, Havers-zulka, Bastian

arXiv.org Artificial IntelligenceSep-4-2025

Integrating Artificial Intelligence (AI) technology in electric vehicles (EV) introduces unique challenges for safety assurance, particularly within the framework of ISO 26262, which governs functional safety in the automotive domain. Traditional assessment methodologies are not geared toward evaluating AI-based functions and require evolving standards and practices. This paper explores how an independent assessment of an AI component in an EV can be achieved when combining ISO 26262 with the recently released ISO/PAS 8800, whose scope is AI safety for road vehicles. The AI-driven State of Charge (SOC) battery estimation exemplifies the process. Key features relevant to the independent assessment of this extended evaluation approach are identified. As part of the evaluation, robustness testing of the AI component is conducted using fault injection experiments, wherein perturbed sensor inputs are systematically introduced to assess the component's resilience to input variance.

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Artificial Intelligence

2509.0327

Country: Europe > Germany (0.28)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient Triple Modular Redundancy for Reliability Enhancement of DNNs Using Explainable AI

Soroush, Kimia, Shirazi, Nastaran, Raji, Mohsen

arXiv.org Artificial IntelligenceJul-15-2025

Deep Neural Networks (DNNs) are widely employed in safety-critical domains, where ensuring their reliability is essential. Triple Modular Redundancy (TMR) is an effective technique to enhance the reliability of DNNs in the presence of bit-flip faults. In order to handle the significant overhead of TMR, it is applied selectively on the parameters and components with the highest contribution at the model output. Hence, the accuracy of the selection criterion plays the key role on the efficiency of TMR. This paper presents an efficient TMR approach to enhance the reliability of DNNs against bit-flip faults using an Explainable Artificial Intelligence (XAI) method. Since XAI can provide valuable insights about the importance of individual neurons and weights in the performance of the network, they can be applied as the selection metric in TMR techniques. The proposed method utilizes a low-cost, gradient-based XAI technique known as Layer-wise Relevance Propagation (LRP) to calculate importance scores for DNN parameters. These scores are then used to enhance the reliability of the model, with the most critical weights being protected by TMR. The proposed approach is evaluated on two DNN models, VGG16 and AlexNet, using datasets such as MNIST and CIFAR-10. The results demonstrate that the method can protect the AlexNet model at a bit error rate of 10-4, achieving over 60% reliability improvement while maintaining the same overhead as state-of-the-art methods.

artificial intelligence, machine learning, overhead, (16 more...)

arXiv.org Artificial Intelligence

2507.08829

Country: Asia > Middle East > Iran (0.15)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

ROS Help Desk: GenAI Powered, User-Centric Framework for ROS Error Diagnosis and Debugging

Katuwandeniya, Kavindie, Widhanapathirana, Samith Rajapaksha Jayasekara

arXiv.org Artificial IntelligenceJul-11-2025

As the robotics systems increasingly integrate into daily life, from smart home assistants to the new-wave of industrial automation systems (Industry 4.0), there's an increasing need to bridge the gap between complex robotic systems and everyday users. The Robot Operating System (ROS) is a flexible framework often utilised in writing robot software, providing tools and libraries for building complex robotic systems. However, ROS's distributed architecture and technical messaging system create barriers for understanding robot status and diagnosing errors. This gap can lead to extended maintenance downtimes, as users with limited ROS knowledge may struggle to quickly diagnose and resolve system issues. Moreover, this deficit in expertise often delays proactive maintenance and troubleshooting, further increasing the frequency and duration of system interruptions. ROS Help Desk provides intuitive error explanations and debugging support, dynamically customized to users of varying expertise levels. It features user-centric debugging tools that simplify error diagnosis, implements proactive error detection capabilities to reduce downtime, and integrates multimodal data processing for comprehensive system state understanding across multi-sensor data (e.g., lidar, RGB). Testing qualitatively and quantitatively with artificially induced errors demonstrates the system's ability to proactively and accurately diagnose problems, ultimately reducing maintenance time and fostering more effective human-robot collaboration.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.07846

Country: Oceania > Australia (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection

Cotroneo, Domenico, De Rosa, Giuseppe, Liguori, Pietro

arXiv.org Artificial IntelligenceMay-12-2025

It mentions modifying the put method and altering the release mechanism, leading to potential issues such as deadlocks or inconsistent states but avoids specifying exact code lines. This level provides testers with a broader understanding of the fault's behavior and consequences. In the High-Level Description (bottom right), we make the description entirely abstract and omit technical or contextual details about the specific fault. Modifying the put method introduces a " wrong algorithm small sparse modifications fault " in the fault-free function. This description suits scenarios where a conceptual understanding of the fault type is sufficient without providing implementation specifics. A team of six researchers specialized in computer engineering and cybersecurity created and validated the fault descriptions, under the coordination of a full professor with extensive expertise in software testing and fault injection. The professor established the description style, while the postdoctoral researcher, with a PhD in information technologies and background in AI and fault injection, provided ongoing reviews and feedback. The team, which also included a PhD student in cybersecurity and four M.Sc.

machine learning, natural language, programming language, (18 more...)

arXiv.org Artificial Intelligence

2505.05777

Country: North America > United States (0.29)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

A Case Study on the Application of Digital Twins for Enhancing CPS Operations

Muntean, Irina, Frasheri, Mirgita, Munaro, Tiziano

arXiv.org Artificial IntelligenceMay-8-2025

To ensure the availability and reduce the downtime of complex cyber-physical systems across different domains, e.g., agriculture and manufacturing, fault tolerance mechanisms are implemented which are complex in both their development and operation. In addition, cyber-physical systems are often confronted with limited hardware resources or are legacy systems, both often hindering the addition of new functionalities directly on the onboard hardware. Digital Twins can be adopted to offload expensive computations, as well as providing support through fault tolerance mechanisms, thus decreasing costs and operational downtime of cyber-physical systems. In this paper, we show the feasibility of a Digital Twin used for enhancing cyber-physical system operations, specifically through functional augmentation and increased fault tolerance, in an industry-oriented use case.

artificial intelligence, digital twin, fault tolerance, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.418.3

2505.04323

Country:

Europe (0.47)
North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Information Technology (1.00)
Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

ADDT -- A Digital Twin Framework for Proactive Safety Validation in Autonomous Driving Systems

Yu, Bo, Yuan, Chaoran, Wan, Zishen, Tang, Jie, Kurdahi, Fadi, Liu, Shaoshan

arXiv.org Artificial IntelligenceApr-15-2025

Autonomous driving systems continue to face safety-critical failures, often triggered by rare and unpredictable corner cases that evade conventional testing. We present the Autonomous Driving Digital Twin (ADDT) framework, a high-fidelity simulation platform designed to proactively identify hidden faults, evaluate real-time performance, and validate safety before deployment. ADDT combines realistic digital models of driving environments, vehicle dynamics, sensor behavior, and fault conditions to enable scalable, scenario-rich stress-testing under diverse and adverse conditions. It supports adaptive exploration of edge cases using reinforcement-driven techniques, uncovering failure modes that physical road testing often misses. By shifting from reactive debugging to proactive simulation-driven validation, ADDT enables a more rigorous and transparent approach to autonomous vehicle safety engineering. To accelerate adoption and facilitate industry-wide safety improvements, the entire ADDT framework has been released as open-source software, providing developers with an accessible and extensible tool for comprehensive safety testing at scale.

addt, artificial intelligence, scenario, (15 more...)

arXiv.org Artificial Intelligence

2504.09461

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Architecture (1.00)

Add feedback

Evaluating Single Event Upsets in Deep Neural Networks for Semantic Segmentation: an embedded system perspective

Gutiérrez-Zaballa, Jon, Basterretxea, Koldo, Echanobe, Javier

arXiv.org Artificial IntelligenceDec-4-2024

As the deployment of artifical intelligence (AI) algorithms at edge devices becomes increasingly prevalent, enhancing the robustness and reliability of autonomous AI-based perception and decision systems is becoming as relevant as precision and performance, especially in applications areas considered safety-critical such as autonomous driving and aerospace. This paper delves into the robustness assessment in embedded Deep Neural Networks (DNNs), particularly focusing on the impact of parameter perturbations produced by single event upsets (SEUs) on convolutional neural networks (CNN) for image semantic segmentation. By scrutinizing the layer-by-layer and bit-by-bit sensitivity of various encoder-decoder models to soft errors, this study thoroughly investigates the vulnerability of segmentation DNNs to SEUs and evaluates the consequences of techniques like model pruning and parameter quantization on the robustness of compressed models aimed at embedded implementations. The findings offer valuable insights into the mechanisms underlying SEU-induced failures that allow for evaluating the robustness of DNNs once trained in advance. Moreover, based on the collected data, we propose a set of practical lightweight error mitigation techniques with no memory or computational cost suitable for resource-constrained deployments. The code used to perform the fault injection (FI) campaign is available at https://github.com/jonGuti13/TensorFI2 , while the code to implement proposed techniques is available at https://github.com/jonGuti13/parameterProtection .

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.sysarc.2024.103242

2412.0363

Country:

Europe > Spain > Basque Country (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback